Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information.

نویسندگان

  • S M Hebsgaard
  • P G Korning
  • N Tolstrup
  • J Engelbrecht
  • P Rouzé
  • S Brunak
چکیده

Artificial neural networks have been combined with a rule based system to predict intron splice sites in the dicot plant Arabidopsis thaliana. A two step prediction scheme, where a global prediction of the coding potential regulates a cutoff level for a local prediction of splice sites, is refined by rules based on splice site confidence values, prediction scores, coding context and distances between potential splice sites. In this approach, the prediction of splice sites mutually affect each other in a non-local manner. The combined approach drastically reduces the large amount of false positive splice sites normally haunting splice site prediction. An analysis of the errors made by the networks in the first step of the method revealed a previously unknown feature, a frequent T-tract prolongation containing cryptic acceptor sites in the 5' end of exons. The method presented here has been compared with three other approaches, GeneFinder, Gene-Mark and Grail. Overall the method presented here is an order of magnitude better. We show that the new method is able to find a donor site in the coding sequence for the jelly fish Green Fluorescent Protein, exactly at the position that was experimentally observed in A.thaliana transformants. Predictions for alternatively spliced genes are also presented, together with examples of genes from other dicots, monocots and algae. The method has been made available through electronic mail ([email protected]), or the WWW at http://www.cbs.dtu.dk/NetPlantGene.html

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA.

Prediction of splice site selection and efficiency from sequence inspection is of fundamental interest (testing the current knowledge of requisite sequence features) and practical importance (genome annotation, design of mutant or transgenic organisms). In plants, the dominant variables affecting splice site selection and efficiency include the degree of matching to the extended splice site con...

متن کامل

Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction

Accurate splice site prediction is a critical component of any computational approach to gene prediction in higher organisms. Existing approaches generally use sequence-based models that capture local dependencies among nucleotides in a small window around the splice site. We present evidence that computationally predicted secondary structure of moderate length pre-mRNA subsequencies contains i...

متن کامل

Computational gene prediction using multiple sources of evidence.

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice...

متن کامل

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

A Genetic Screen Identifies PRP18a, a Putative Second Step Splicing Factor Important for Alternative Splicing and a Normal Phenotype in Arabidopsis thaliana

Splicing of pre-mRNA involves two consecutive trans-esterification steps that take place in the spliceosome, a large dynamic ribonucleoprotein complex situated in the nucleus. In addition to core spliceosomal proteins, each catalytic step requires step-specific factors. Although the Arabidopsis thaliana genome encodes around 430 predicted splicing factors, functional information about these pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Nucleic acids research

دوره 24 17  شماره 

صفحات  -

تاریخ انتشار 1996